Ad i gets rewards y from Bernoulli distribution *p(**y**|θi) ~ ß(θi)*
θi is unknown but we set its uncertainty by assuming it has a uniform distribution *p(θi)* ~ *u*([0, 1])
, which is the prior distribution.
Bayes Rule: We approach θi by the posterior distribution
We get *p(θi|**y**) ~ ß(number of successes + 1, number of failures + 1)*
At each round n we take a random draw θi(n) from this posterior distribution p(θ<sub>i</sub>|**y**)
, for each ad i.
At each round n we select the ad i that has the highest θi(n).
Step 1: At each round n, we consider two numbers for each ad i:
Step 2: For each ad i, we take a random draw from the distribution below:
θi(n) = ß(N1i(n) + 1, N0i(n) + 1)
Step 3: We select the ad that has the highest θi(n).
Upper Confidence Bound | Thompson Sampling |
---|---|
![]() |
![]() |
Deterministic | Probabilistic |
Requires update at every round | Can accommodate delayed feedback |
Less empirical evidence than Thompson Sampling | Better empirical evidence than UCB |
«Previous | Next» |